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1  Overview 


The  scope  of  high-performance  computing  is  rapidly  expanding  from  single  parallel 
systems  to  distributed  collections  of  heterogeneous  sequential  and  parallel  systems. 
Moreover,  emerging  applications  are  irregular,  with  complex,  data  dependent  execution 
behavior,  and  dynamic,  with  time  varying  resource  demands.  In  consequence,  application 
developers  increasingly  complain  that  even  small  changes  in  application  structure  can 
lead  to  large  changes  in  observed  performance. 

The  performance  sensitivity  of  current  parallel  and  distributed  systems  is  a  direct 
consequence  of  resource  interaction  complexity  and  the  failure  t  recognize  that  resource 
allocation  and  management  must  evolve  with  applications,  becoming  more  flexible  and 
resilient  to  changing  resource  availability  and  resource  demands.  Currently,  software 
developers  are  forced  to  engage  in  a  time  consuming  cycle  of  program  development, 
performance  measurement,  and  tuning  to  create  non-portable  code  that  conforms  to 
parallel  and  distributed  system  idiosyncrasies. 

Distressingly,  the  space  of  possible  performance  optimizations  is  large  and  non-convex, 
and  the  best  match  of  application  and  resource  management  technique  is  seldom  obvious 
a  priori.  Performance  instrumentation  and  analysis  provide  the  data  necessary  to 
understand  the  causes  for  poor  performance  a  posteriori,  but  alone  they  are  insufficient  to 
adapt  to  temporally  varying  application  resource  demands  and  systems  responses. 

Because  the  interactions  between  application  and  system  software  change  across 
applications  and  during  a  single  application’s  execution,  we  believe  runtime  libraries  and 
resource  management  policies  are  needed  that  can  adapt  to  rapidly  changing  application 
behavior. 

By  integrating  dynamic  performance  instrumentation  and  on-the-fly  performance  data 
reduction  with  configurable,  malleable  resource  management  algorithms  and  a  real-time 
adaptive  control  mechanism,  flexible  runtime  systems  could  automatically  choose  and 
configure  resource  management  algorithms  based  on  application  request  patterns  and 
observed  system  performance.  Such  an  adaptive  resource  management  infrastructure  can 
increase  portability  by  allowing  application  and  runtime  libraries  to  adapt  to  disparate 
hardware  and  software  platforms  and  increases  achieved  performance  by  choosing  and 
configuring  those  resource  management  algorithms  best  matched  to  temporally  varying 
application  behavior. 

Based  on  this  thesis,  we  developed  yfutopi/ot,  a  real-time  adaptive  control  infrastructure. 
As  described  below.  Autopilot  provides  a  flexible  set  of  performance  sensors,  decision 
procedures,  and  policy  actuators  to  realize  adaptive  control  of  applications  and  resource 
management  policies  on  both  parallel  and  wide  area  distributed  systems  (computational 
grids). 

2  Project  Approach 

Emerging  defense  and  civilian  applications  are  parallel,  distributed,  and  mobile,  are 
driven  by  real-time  data  sources,  have  time  varying  resource  demands,  and  must 
accommodate  dynamically  changing  resource  availability  (e.g.,  due  to  failures  or 
resource  contention).  At  present,  optimizing  the  performance  of  these  applications 
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requires  multiple  iterations  of  ad  hoc  measurement,  post-mortem  analysis,  and  platform- 
speeifie  tuning,  limiting  performanee,  resilienee,  portability,  and  adaptability. 

The  objeetive  of  this  projeet  is  to  replaee  ad  hoc,  post-mortem  performanee  optimization 
with  an  extensible,  portable,  and  distributed  software  infrastrueture  for  real-time  adaptive 
eontrol  that  dynamieally  optimizes  the  performanee  of  distributed  applieations.  Via  this 
eross-platform  optimization  infrastrueture,  developers  ean  build  robust,  distributed, 
mobile  applieations  that  are  resilient  to  ehanging  resouree  availability. 

To  support  adaptive  eontrol,  we  built  and  validated  a  C++  infrastrueture  cdXXQd  AutopUot 
that  ean  be  used  to  ereate  nimble  applieations  that  ean  dynamieally  reeonfigure  their 
behavior  to  meet  ehanging  resouree  eonditions.  AutopUot  embodies  the  following 
features: 

•  Distributed  performance  sensors  that  ean  eapture  applieation  and  system 
performanee  data  and  generate  performanee  metries. 

•  Software  actuators  that  ean  enable  and  eonfigure  applieation  behavior  and 
resouree  management  polieies. 

•  Decision  procedures  for  seleeting  resouree  management  polieies  and  enabling 
aetuators  based  on  observed  applieation  resouree  requests  and  the  system 
responses  eaptured  by  performanee  sensors. 

•  Distributed  name  servers  that  support  registration  by  remote  sensors  and 
aetuators  and  property-based  requests  for  sensors  and  aetuators  by  remote 
elients. 

•  Sensor  and  actuator  clients  that  interaet  wit  remote  sensors  and  aetuators, 
monitoring  sensor  data  and  issuing  eommands  to  aetuators. 

•  Desktop  performance  visualization  tools  to  provide  analysts  insight  into  the 
interaetion  of  applieation  demands  and  resouree  management  algorithm 
response. 

In  this  design,  performanee  instrumentation  sensors  eapture  and  eompute  quantitative 
applieation  and  system  performanee  metries.  This  data  is  used  by  deeision  proeedures  to 
ehoose  and  eonfigure  resouree  management  polieies  via  software  aetuators. 

We  tested  the  Autopilot  infrastrueture  by  developing  an  adaptive,  high-performanee  I/O 
toolkit  ealled  PPFSII Our  earlier  measurements  of  applieation  I/O  patterns  and  file 
system  responses  showed  that  aehievable  performanee  was  strongly  sensitive  to  small 
ehanges  in  either  aeeess  patterns  or  polieies.  By  using  Autopilot  sensors  to  measure  I/O 
patterns,  together  with  fuzzy  logie  rules  for  file  system  poliey  seleetion  and  aetuators  for 
poliey  eonfiguration,  PPFS  II  yielded  dramatie  improvements  to  I/O  performanee. 

We  also  validated  using  a  large-seale  parallel  applieation.  This  applieation, 

developed  by  the  DOE  ASCI  Center  for  the  Simulation  of  Advaneed  Roekets  (CSAR),  is 
written  in  Fortran  90  using  MPT  Although  the  solid  roeket  burn  requires  only  two 
minutes,  simulating  0.5  seeonds  of  the  burn  is  estimated  to  require  200  hours  on  a  128- 


*  Primary  funding  for  PPFS  II  development  was  from  other  sources. 
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node  SGI  0rigin2000.  With  the  high  eost  of  roeket  failures,  optimizing  this  simulation  is 
of  great  praetieal  importance. 

We  instrumented  the  CSAR  code  by  inserting  Autopilot  sensors  that  compute  (a)  the 
number  of  invocations  and  time  spent  in  each  procedure  of  the  code’s  call  graph  and  (b) 
capture  data  on  the  execution  of  logical  code  regions  (e.g.,  initialization,  fluids,  solids, 
and  output).  We  then  combined  this  sensor  data  with  actuators  to  control  remote 
performance  data  capture  in  real-time.  This  integrated,  real-time  performance 
visualization  and  control  experiment  represents  a  major  validation  of  Autopilot  and 
provides  functionality  not  present  in  other  distributed/parallel  performance  measurement 
toolkits. 


3  Autopilot  Software  Overview 

Autopilot  is  implemented  as  a  set  of  C++  classes  and  is  built  atop  the  DARPA-funded 
Globus  wide-area  computing  toolkit.  Globus  provides  a  shared  address  space  across  local 
and  wide-area  networks  and  enables  interprocess,  intraprocess,  and  intermachine  data 
sharing.  Globus  also  supports  heterogeneity,  allowing  a  single  computation  to  use 
multiple  communication  protocols,  executables,  and  programming  models. 

Using  Globus  as  a  base,  the  Autopilot  toolkit  defines  sensors,  actuators,  decision 
procedures,  and  sensor/actuator  managers,  all  accessible  via  Globus  “global  pointers:” 
see  Figure  1.  Sensors  are  low-overhead  routines  designed  to  capture  real-time 
performance  data  from  distributed  software  components  and  can  be  extended  via  user- 
defined  functions  to  process  raw  performance  data  before  transmission  (e.g.,  computing  a 
profile  from  event  trace  data).  In  turn,  actuators  define  a  mechanism  for  implementing 
control  functions  in  software  modules  (e.g.,  changing  policies  or  policy  parameters  in 
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response  to  remote  deeision  proeedures).  Sensor/actuator  managers  can  be  viewed  as 


access  points  for  sensors  and  actuators. 

Decision  procedures  implement  fuzzy  logic  control  of  distributed  software.  They  accept 
and  evaluate  real-time  data  from  one  or  more  sensors  and  generate  actuator  outputs  in 
response.  Our  fuzzy  logic  toolkit  is  based  on  a  subset  of  the  publicly  available  ComNets 
Class  Library  (CNCL),  with  modifications  and  extensions  for  software  control. 

Using  sensors  and  actuators  embedded  in  distributed  code,  one  can  adaptively  control 
software  behavior  either  interactively  or  via  fuzzy  logic  decision  procedures.  Both 
decision  procedures  and  interactive  systems  can  query  managers  with  sensor  and  actuator 
attributes  and  retrieve  global  points  to  all  sensors  and  actuators  that  match  the  specified 
attributes.  Autopilot’s  Java-based  Autodriver  desktop  interface,  shown  in  Figure  2,  allows 
users  to  attach  dynamically  to  sensors  and  actuators,  display  time -varying  sensor  data, 
and  change  actuator  values. 

4  Project  Accomplishments  and  Impact 

The  Autopilot  research  and  resulting  toolkit  are  recognized  as  major  components  of  the 
nascent,  multi-agency  computational  grid  program.  In  particular,  the  Autopilot  toolkit  is  a 
targeted  technology  of  the  National  Computational  Science  Alliance’s  Partnership  for  an 
Advanced  Computational  Infrastructure  (PACI)  focus  on  wide-area  computation  and 
distributed  collaboration.  The  Autopilot  software  targets  optimization  of  computation 
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behavior  when  (a)  executing  on  nation-scale  distributed  resources  via  the  Globus  toolkit 
and  (b)  using  high-performance  I/O  systems  on  PC  clusters. 

Autopilot  is  also  a  major  component  of  the  DoE  nuclear  weapons  stockpile  stewardship 
program,  via  the  Accelerated  Strategic  Computing  initiative  (ASCI).  In  particular, 
Autopilot  technology  is  a  part  of  DoE  high-performance  computing  software  research  and 
development  efforts.  Similarly,  our  PPFS  II  adaptive  I/O  library,  based  on  Autopilot, 
targets  both  analysis  of  I/O  patterns  in  ASCI  codes  and  dynamic  optimization  of  these  I.O 
patterns.  Working  wit  EENE<  EANL,  and  SNL,  we  are  instrumenting  laboratory  codes 
and  ASCI  I/O  libraries  and  developing  adaptive  I/O  policies  that  support  patterns  found 
in  these  codes  and  libraries.  In  addition,  lessons  from  PPES  II  and  Autopilot  are  helping 
drive  creation  of  data  and  visualization  corridors  for  the  ASCI  program.  These  corridors 
will  support  distributed  analysis  and  visualization  of  petabyte  data  sets. 

Autopilot  is  the  basis  for  wide-area  network  tuning  and  optimization  with  the  Globus 
toolkit,  itself  the  base  for  the  multiagency  (NSE,  NASA,  DoE,  and  DARPA) 
computational  grid.  As  part  of  the  DoE  NGI  effort,  we  will  be  further  integrating 
Autopilot  and  Globus  for  adaptive  network  control. 

We  also  are  working  closely  with  contractors  for  the  DoD  High-Performance  Computing 
Modernization  Program  (HPCCMP)^  who  are  deploying  our  performance  tools  at  DoD 
Major  Shared  Resource  Centers  (e.g.  at  CEWES)  and  using  them  to  help  DoD  application 
Developers  optimize  their  codes. 

5  Software  Availability 

During  its  lifetime,  the  project  released  two  major  versions  of  the  Autopilot  software 
toolkit.  Information  on  how  to  obtain  the  current  version  of  the  Autopilot  software  can  be 
obtained  via  the  World  Wide  Web  at; 

http://www-pablo.cs.uiuc.edu/Proiects/Autopilot/AutopilotOverview.htm 


^  See  http://www.hpcmo.hpc.mil  for  details. 
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